The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets

نویسندگان

چکیده

Abstract Background The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the depth. While unmapped or non-exonic reads do not contribute to quantification, duplicate quantification but are informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) a useful measure RNA-Seq datasets used analysis. Findings In bulk from 2,179 tumors in 48 cohorts, fraction analysis varies greatly. Unmapped constitute 1–77% all (median [IQR], 3% [3–6%]); 3–100% mapped 27% [13–43%]); and 4–97% 25% [16–37%]). MEND 0–79% total 50% [30–61%]). Conclusions Because an dataset measurements varies, we propose reporting dataset's depth reads, which definitively inform expression, rather than total, exonic reads. provide Docker image containing (i) existing required tools (RSeQC, sambamba, samblaster) (ii) custom script calculate data files. recommend experiments, sensitivity studies, recommendations use units

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Filtering duplicate reads from 454 pyrosequencing data

MOTIVATION Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the e...

متن کامل

Critique: ”Filtering duplicate reads from 454 pyrosequencing”

The paper describes a novel approach for filtering duplicate reads from 454 pyrosequencing data. This problem is motivated by the need of reduce sequencing errors and artifically duplicated reads in some applications such as de-novo whole genome sequencing or metagenomics. Existing solutions are often based on nucleotide sequences, while raw flowgram values, which contain additional information...

متن کامل

Quantifying uniformity of mapped reads

UNLABELLED We describe a tool for quantifying the uniformity of mapped reads in high-throughput sequencing experiments. Our statistic directly measures the uniformity of both read position and fragment length, and we explain how to compute a P-value that can be used to quantify biases arising from experimental protocols and mapping procedures. Our method is useful for comparing different protoc...

متن کامل

Population Sequencing Using Short Reads: HIV as a Case Study

Despite many drawbacks, traditional sequencing technologies have proven to be invaluable in modern medical research, even when the targeted genomes are highly variable. While it is often known in such cases that multiple slightly different sequences are present in the analyzed sample in concentrations that vary dramatically, the traditional techniques typically allow only the most dominant stra...

متن کامل

Reduction of non-insert sequence reads by dimer eliminator LNA oligonucleotide for small RNA deep sequencing.

Here we describe a method for constructing small RNA libraries for high-throughput sequencing in which we have made a significant improvement to commonly available standard protocols. We added a locked nucleic acid (LNA) oligonucleotide--named dimer eliminator--that is complementary to the adapter-dimer ligation products during the reverse transcription reaction. It reduces adapter-dimers, whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: GigaScience

سال: 2021

ISSN: ['2047-217X']

DOI: https://doi.org/10.1093/gigascience/giab011